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High School Graduation, Completion, 
and Dropout (GCD) Indicators 



1. Introduction 

High school completion is a fundamental educational process that holds important implications both for 
individuals and for educational systems. On the one hand, obtaining a high school diploma offers an 
individual a variety of advantages, including the expectation of more stable employment prospects, higher 
lifetime earnings, and the opportunity to continue one’s education at the postsecondary level. On the 
other hand, an educational agency — a state public education system, school district, or individual 
school — might seek to gauge its effectiveness by determining the percentage of its high school students 
that earn a diploma or fail to do so. While there is considerable consensus around the value of high 
school completion, there is surprisingly little agreement regarding the methods in which high school 
outcomes could or should be empirically measured. In fact, limited attention has been devoted even to 
mapping out the variety of statistical indicators used to empirically measure these outcomes. 

The purpose of this catalog is to provide a basic inventory of the various methods for estimating high 
school graduation, completion, and dropout (GCD) rates that are currently being used by federal 
governmental agencies and state education agencies (SEAs). It would not be possible in the context of 
this small-scale investigation to provide a complete accounting of all these methods. In the inventory, we 
identify and explicate over 70 distinctive GCD indicators. Given unlimited time and resources, it might 
very well have been possible to identify hundreds of statistical measures employed for purposes ranging 
from research to public information to accountability. Even at its present scale, however, we believe that 
this catalog serves to effectively document a considerable amount of the variability in contemporary 
approaches to measuring high school outcomes. 

A mere inventory of statistical indicators will be of limited use without some means of systematically 
describing the salient aspects of these measures and classifying the various indicators into conceptually 
and practically meaningful categories. In fact, a lack of uniformity in the use of concepts and technical 
terminology has been something of a barrier to clearly communicating information and statistics related to 
high school completion and dropout. In some contexts, for instance, a meaningful distinction might be 
drawn between an individual who “completes” high school and one who “graduates.” At other times these 
potentially distinct outcomes are treated interchangeably. A given technical expression can in turn be 
used in an inconsistent, unclear, imprecise, excessively broad, or overly narrow manner. The term 
“cohort,” for example, may carry different connotations to researchers versus practitioners, to researchers 
in various disciplines, to different governmental agencies, or to a single agency depending on the context 
or application. 

It will not be the goal of this report to devise a comprehensive or definitive taxonomy of statistical 
approaches to measuring GCD rates. Such an endeavor, if it were to be pursued conscientiously, would 
represent a major undertaking unto itself. However, a working classification system for these types of 
indicators is necessary for this report. Indeed, the act of constructing a catalog composed of uniformly 
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structured entries for GCD indicators would itself imply the presence of at least a provisional scheme for 
conceptually identifying and organizing relevant information about those indicators. 

This conceptual portion of the present investigation proceeds in a series of steps and constitutes what 
might be thought of as a primer on GCD indicators. 

> Outlining a basic conceptual framework that will assist in thinking rigorously about high school 
completion processes and addressing the challenges associated with empirically measuring 
graduation, completion, and dropout rates. 

> Utilizing this provisional framework in order to derive a set of categories for classifying various 
kinds of empirical GCD indicators. These categories will employ extent terminology where it is 
useful but will also seek to clarify ambiguities in current usage wherever possible. 

> Explaining key methodological issues associated with the measurement of GCD rates at a level 
of specificity that will be meaningful to researchers but also accessible to policymakers, 
educators, and other general audiences concerned with this issue. 

> Applying this provisional classification scheme in a consistent manner to develop catalog entries 
for actual GCD rate indicators. In other words, the conceptual stage of this investigation will 
directly guide the concepts and terminology used in the indicator catalog as well as the 
organization of the entries themselves. 



The remainder of this report is organized as follows. Section 2 provides a brief discussion of the high 
school processes and outcomes of interest in this catalog. Section 3 describes various types of data 
systems from which an indicator can be developed. These data systems are represented visually using a 
contingency table approach that closely resembles the kinds of aggregated, group-level data from which 
many agencies calculate GCD rates. Section 4 discusses different kinds of “rates” and the way in which 
data considerations intersect with mathematical formulations to define a statistical indicator. Section 5 
offers a brief overview of several of the major sources of data on high school outcomes. These are the 
sources from which the most prominent, publicly reported findings on high school graduation, completion, 
and dropout are derived. Section 6 describes the review process used to identify indicators included in 
the catalog as well as the format of the indicator entries themselves. Finally, Section 7 presents the 
individual GCD indicator entries. 
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2. High School Outcomes 

This catalog is concerned with the measurement of certain high school outcomes. In particular, we are 
interested in the progression of students through high school and the way in which students terminate 
their participation with the secondary education system. This process of entering, advancing through, and 
eventually leaving high school can be represented in more or less elaborate ways. Exhibit 1 below offers 
one very simple, perhaps naive, way to depict a student’s path. Although simplistic, the key distinction 
drawn here between students who drop out of high school and those who graduate is one that resonates 
with common-sense ways of thinking about high school completion. Often this basic logic also carries 
over into efforts to measure and report on these outcomes. 



In real life, of course, teenagers can follow many paths through high school. In some cases, two students 
may start and end in the same places but take strikingly divergent courses between those two points. 
One student, for example, has struggled with school her whole life and was held back several times in 
elementary and middle school. She encounters further academic difficulties immediately upon entering 
high school and drops out before finishing the ninth grade. She never returns to high school, but passes 
the GED a decade later and obtains a high school equivalency credential that offers the hope of 
advancement in the workplace. A second student progresses normally through high school until the 
middle of his junior year. At this point, the death of a parent sends his family into difficult financial straits 
and the student takes a job at night to help support his family. Although he tries to stay in school, the 
demands of a full-time job at night and attending school during the day become overwhelming and he 
drops out. Over the next year and a half, he makes several short-lived attempts to reenroll and finish high 
school but drops out each time. Finally, this student decides to take the GED and passes. A high school 
equivalency allows him to enroll at the local community college, where the flexible schedule allows him to 
work toward a college degree while keeping his job. 



Exhibit 1 : Simple Paths through High School 



High School 





Dropout 



Graduate 



- 3 - 



The Urban Institute / Education Policy Center 



GCD Indicator Catalog 



Both of these hypothetical students started high school, dropped out, and eventually obtained high school 
equivalencies. But beyond these few common points, their experiences diverged. The first student left 
high school and never returned, a relatively unambiguous case of a dropout. The second student, 
however, returned to school on several occasions before leaving for the last time. Snapshots taken at 
different points during this period might alternatively show this second teenager as an enrolled high 
school student or a dropout. 



Exhibit 2: Less Simple Paths through High School 
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Exhibit 2 depicts a more elaborate set of paths through high school. While probably still a simplification, 
this diagram comes closer to representing the full range of experiences teenagers actually encounter. 1 
This new depiction, for example, recognizes that dropping out and returning to school are events that can 
occur multiple times. In addition, our first diagram’s short-hand identification of students who finish high 
school as “graduates” has been considerably expanded in Exhibit 2. Here we find a distinction between 
students who have completed a regular high school program and those who have not. Among the 
broader group of Completers, some earn regular diplomas (Graduates) while others obtain alternative, 
non-diploma credentials. These latter Alternative Completers might include students who fulfill course- 
taking requirements but fail to pass a high school exit exam (receiving a certificate of attendance) or 
special education students who met goals established in their Individualized Education Programs 
(perhaps receiving an IEP or special education certificate). 

In addition to a distinction between program completers and non-completers, this more elaborate 
illustration of paths through secondary level education also calls attention to the different types of high 
school credentials an individual might obtain. Program completion and receipt of credentials are often 
discussed in ways that suggest the terms are interchangeable. As we see here, however, this distinction 
is not always so clear. Certain credentials, like regular diplomas, generally do imply the successful 
completion of a full secondary-level program of study. But other credentials, like certificates of 
attendance, might be awarded for meeting only some of the requirements of a standard high school 
program or for the completion of a non-standard program. Individuals may also earn high school 
credentials through a test-based equivalency like the GED that occurs independently of completing or 
even attending a secondary education program. 2 

The graduation, completion, and dropout rate indicators at the center of this study are intended to 
measure the prevalence of the various high school outcomes depicted in Exhibit 2. As the discussions 
above suggest, the paths students follow through high school are more numerous, complicated, and 
interconnected than is often acknowledged. In addition, the terms commonly used to describe these 
various outcomes are often applied in inconsistent or imprecise ways. It will be necessary, therefore, to 
explicitly define this terminology as it will be used in this report. We will define the key statuses or 
outcomes of the secondary education process as follows. 

> Enrolled Student. An individual who is currently attending a formal education program at the 
elementary or secondary level. This would not include individuals enrolled in adult education 
programs, even if they are of high school age. 

> Not Enrolled. An individual who is not currently receiving formal educational services at the 
elementary or secondary level. This group might include those who were Never Enrolled in a 
particular educational system of interest (e.g., adult immigrants) as well as those who had been 
enrolled at some point in the past. Members of either group can be further classified on the basis 
of their past educational experiences. In the present context, we will be interested in the way in 
which these individuals left the elementary/secondary educational system (e.g., as completers 
versus dropouts). 



1 This diagram builds on a figure that has been presented in Department of Education publications in the past (see 
the Condition of Education 1986 for an example). 

2 The abbreviation GED stands for General Educational Development, a series of tests administered by the American 
Council on Education. Individuals who pass this battery of assessments and meet other state-established 
qualifications may receive a high school credential. For more information on the GED, see Who Passed the GED 
Tests? 2002 Statistical Report, American Council on Education, Washington D.C., 2004. 
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> High School Program Completer. An individual who has received a high school credential as 
the result of fulfilling the requirements of a standard or modified secondary educational program. 
An individual can complete high school through two major routes. 

o Graduate. A student who receives a standard high school diploma as the result of 
fulfilling all of the requirements of a standard secondary educational program. The 
particular set of conditions that students must meet to earn a diploma (or other credential) 
may be defined independently by local authorities within a broader system. For example, 
individual state education agencies within the United States are responsible for 
determining the course-taking requirements, test score performance, and other criteria 
associated with a high school diploma within their respective jurisdictions. 

o Alternative Completer. A student who obtains a non-diploma credential for completing 
a formal secondary educational program that does not meet the requirements necessary 
to be awarded a standard diploma. The group of alternative completers could potentially 
be further subdivided based on the particular type of alternative program or credential of 
interest (e.g., certificate of attendance, IEP certificate). The availability and types of non- 
diploma credentials vary across local jurisdictions. 

> High School Program Non-Completer. An individual who is not currently enrolled in school at 
the elementary or secondary level and has not completed a formal high school program. This 
group of non-completers includes both individuals who have earned some kind of high school 
credential and those who have not. 

o Dropout. An individual who is not currently receiving educational services at the 
elementary or secondary level and who has not completed a high school program or 
otherwise received a high school credential. This group might include individuals who 
never attended school at the secondary level (e.g., those who dropped out before ninth 
grade). 

o Equivalency Recipient. A non-enrolled individual who has not completed a secondary 
educational program but has obtained a high school credential through a test-based 
equivalency such as the GED. 



Although there are other possible ways of classifying high school outcomes, the scheme described above 
provides a useful foundation for approaching the study of GCD rates. One particular feature of this 
classification, however, should be mentioned. Generally speaking, some conceptualizations of high 
school outcomes are primarily oriented around the issue of credential attainment while others are more 
concerned with an individual’s participation in or completion of a formal educational program. The former 
distinction is often found in studies of educational attainment levels among members of an adult 
population. The latter features prominently in investigations where program completion constitutes an 
important indicator of the performance of particular (public) education systems. 

Both of these perspectives on high school outcomes are valid, although they are not entirely compatible 
with one another. As noted earlier, program non-completers include individuals without high school 
credentials and those who have obtained credentials through an equivalency (rather than through a 
formal secondary education program). This raises the dilemma of whether or not equivalency recipients 
should be considered to have received a high school education in the broader sense. In practice, the high 
school equivalency is generally not treated as an independent empirical category in its own right. Instead, 
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recipients of these credentials tend to be subsumed under the definitions of either dropouts or alternative 
completers. Often the approach to classifying those who receive high school equivalencies is driven less 
by conceptual considerations than by practical necessities (e.g., it may not be possible to reliably 
distinguish among different types of high school credentials in a survey or database). 

The current study will not attempt to resolve this question, which is at least in part a normative one 
revolving around the relative values of various high school credentials. Instead, we will attempt to chart a 
middle course. In keeping with conventional practice, GCD rates will be described primarily in terms of the 
associated credential. However, we will also indicate explicitly how recipients of equivalencies are defined 
in the context of a given indicator (particularly those that involve dropouts and alternative completers). 
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3. Representing Data 

3.1. Types of Databases 

Measuring the rates at which individuals experience certain kinds of events, such as dropping out of or 
graduating from high school, first requires clear definitions of these outcomes and an appreciation of the 
ways in which they are related to one another. The next step in this process involves developing an 
understanding of the kinds of data that will allow us to move from a conceptual representation to an 
empirical measurement of these outcomes. The types of empirical indicators that can be devised to 
measure high school graduation, completion, and dropout are highly dependent on the kinds of data 
available. 

A particular approach to empirical measurement, for instance, may imply some minimal data 
requirements with respect to level of informational detail (e.g., grain size), unit of measurement (e.g., 
individual students versus educational organizations), or ability to track follow individual cases units over 
time (e.g., a student tracking system). Information systems could be described on the basis of these and 
potentially many other dimensions relevant to the task of calculating GCD indicators. For the purposes of 
the current investigation, however, three main classes of data systems will be of interest: cross-sectional, 
repeated cross-sectional, and individually tracked data. 

3. 1. 1. Cross-Sectional Data 

Information systems consisting of cross-sectional data provide a snapshot of conditions in a population of 
interest at a single point in time. Cross-sectional data are used primarily to characterize contemporary 
group-level conditions. A population is a set of elements or units that are of particular interest from a 
theoretical, empirical, or practical perspective. The constituent units from which a population is comprised 
may be either individual (e.g., students) or collective (e.g., schools) in nature. In a cross-sectional data 
system, a population of interest is defined and observed at a single point in time. It is often useful to 
distinguish between cross-sectional data systems that (1) consist of contemporary information only, 
where all measurements and variables are anchored to the time of observation, and (2) include 
retrospective information, where some measurements capture conditions at some point prior to the time 
of observation. Retrospective information might be embedded in a cross-sectional data system, for 
instance, by asking respondents in a cross-sectional survey to answer questions about their past 
experiences. 

3. 1.2. Repeated Cross-Sectional Data 

A data system composed of repeated cross-sectional data collections allows analysts to examine group- 
level trends for a focal population. This type of data system might be thought of as a series of multiple, 
independent cross-sectional databases. Here the population of interest is defined and then observations 
of that population are taken multiple points in time. The conceptual definition of the population remains 
the same across the set of observations, but the specific individuals being observed may change over 
time. For example, we might identify 18 year-olds as our population of interest. Then suppose we create a 
repeated cross-sectional database by (1) counting the number of 18-year olds with a high school diploma 
at a given point in time (year y) and then (2) counting the number of 18-year olds with a high school 
diploma again three years later (y+3). The population of interest here remains the same (18-year olds). 
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But the actual individuals being observed are not the same (i.e., the 18-year olds from year y will be 21- 
year-olds three years later when the second measurement is taken). Some databases may define the 
general population of interest in relatively broad terms, such as adults 18 years of age and older. In these 
situations it may also be possible to identify more narrowly defined subpopulations, such as individuals 
who are a particular age (e.g., 18, 19, 20, 21 and so forth). Here, repeated cross-sectional observations 
may allow analysts to track characteristics of a group of individuals over time. For example, comparisons 
of 1 8-year-olds in year y and 21 -year-olds at year y+3. 

3. 1.3. Individually Tracked Data 

The third type of data system involves tracking the individual members of a population over time. 
Information systems of this kind are referred to using a variety of terms, such as longitudinal databases or 
single record systems. To construct this type of database, we would define the population of interest, 
identify individual members of that population, and record observations of those specific individuals at 
multiple points in time. That is, individual members of a population are tracked prospectively over a period 
of time. This enables the measurement of change at an individual level, rather than just the observation of 
trends for the population as a whole or for more narrow subgroups. A longitudinally tracked database is 
quite flexible, since it can also be used to calculate group-level trends or to produce a cross-sectional 
snapshot of the characteristics of a population. 

3. 1.4. Capturing Temporal-Linked Information 

The ways in which temporally anchored information might be captured represents a key distinction among 
the different types of data systems described above. Suppose, for example, we were interested in the 
percentage of individuals who earned a high school diploma at an older-than-average age (between the 
ages of 19 and 21 for our present purposes). One way to estimate that figure would be using individually 
tracked data. In this case, we would survey a sample of 18-year-olds to determine how many had 
received a high school diploma and then resurvey that same group of individuals three years later to see 
how many additional diplomas had been earned in the interim. Alternatively, we could approximate the 
same figure using a single cross-sectional data collection that contains retrospective information. For 
example, we could survey a group of 21 -year-olds and ask the following two questions: (1) Do you 
currently have a high school diploma? and (2) Did you earn a diploma by age 18? Respondents who 
answer “yes” to the first question and “no” to the second would be considered late or over-age graduates. 

It is also worth noting that the identification of population units of interest can be, to some extent, a 
relative distinction. This introduces another type of temporal distinction, particularly in databases that 
might support multiple levels of analysis. Suppose, for example, that a repeated cross-sectional database 
consists of information about students at its most basic level (e.g., counts of dropouts or graduates) but 
that the identity of individual students cannot be linked from one observation point to another. However, 
suppose that this data system possesses a multilevel structure such that the basic elements (e.g., 
students) are nested within identifiable higher-level units (e.g., schools) that can be consistently identified 
and followed over time. Such a database would possess a temporal or longitudinal dimension, although 
only at the aggregate school level. 
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3.2. The Contingency Table Approach 

The flow diagrams presented earlier to describe the paths a student might take through high school are 
useful for a variety of reasons. For example, these diagrams represent the progress of an individual 
through a series of stages. This helps to identify important categories and processes of interest and the 
ways they may be interrelated. These insights are valuable when developing an empirical indicator. 

Another way to represent similar kinds of information would be in tabular form. In particular, contingency 
tables offer a means of visually presenting aggregate data in form that closely resembles the structure of 
actual databases often used to calculate graduation, completion, and dropout rates. An example of a very 
basic contingency table can be found in Exhibit 3. 



Exhibit 3: Basic Contingency Table 



Columns 

j = 1 - J 



Rows 

i = 1 - 1 




Cell 

Nij 



A contingency table is an n th dimensional table where cell values represent the frequency (i.e. , count) of 
cases that fall into a particular cross-classification of the categories defined on the basis of rows, 
columns, and higher-order (n th ) dimensions. Our example is simple two-dimensional table where row 
categories are indexed by /' and column categories are indexed by So, the count of cases appearing in 
a particular table cell would be represented as A/,y. 
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3.2. 1. Contingency Table for High School Progression 

The progression of students through high school can be represented using a contingency table approach. 
We start with a basic two-way table in which rows represent grades and columns represent school years 
(Exhibit 4). It should be noted that one of the dimensions of this table is defined in terms of time. As a 
result, an individual student can only be counted once in each column. That is, a student can only be 
enrolled in one grade at a given point in time. 3 Further, students will be counted once in each column, 
assuming they remain members of the population or system represented by the table for multiple years. 



Exhibit 4: Table of High School Enrollment 



School Year 

y = y to y+3 



y y +1 y+2 y+3 



Grades 

8 = 9-12 









Nl2,y+3 






Nil, y+2 






Nl0,y+1 






N 9 , y 









Cell values in this example (N gy ) would represent the number of student enrolled in a school system at a 
particular secondary grade level (g) at the beginning of a particular school year (y). Students progressing 
on time through high school would be expected to complete one grade per school year. For example, 
students who started the ninth grade in year yand are promoted on schedule would appear in the main 
diagonal cells of the table (i.e., N 9:Y N 10:y+ i N 11: y +2 /V, 2 _ y+3 ). Individuals not progressing normally would 
appear in the off-diagonal cells of the table. This might include students retained or held back because 
they failed a grade (e.g., N 9y+1 ) or students who skip a grade (N 11y+1 ). 



3 The properties of a contingency table, therefore, depend on the properties of its dimensions. Another table, for 
instance, might contain rows and columns that are respectively composed of a set of mutually exclusive categories. 
In that case, each individual person enumerated would appear in one and only one cell in the entire table (i.e., in one 
row-by-column cross-classification). 
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3.2.2. Extending the Contingency Table to Include High School Outcomes 

The grade progression examples above introduce the basic logic of a contingency table that begins to 
capture student paths through high school. That basic representation, however, does not capture the 
outcomes of greatest interest for our current purposes — graduation, completion, and dropout. To 
incorporate these outcome statuses into a contingency table format, we must expand the number of table 
rows and define these categories in a somewhat different manner. 

In the above example (Exhibit 4), row values corresponded to grade levels while columns represent 
years. Although in both of these cases values can be expressed in numerical terms, this need not be the 
case. In fact, the dimensions of a contingency table often capture strictly categorical (or nominal) 
distinctions that imply no numerical ordering. A table might, for instance, classify the number of 
automobiles found in a parking lot on the basis of their color (red, blue, green, other) and manufacturer 
(Honda, Chevy, Subaru, other). 

In Exhibit 5, we incorporate high school outcomes into our contingency table logic by redefining row 
categories based on a combination of two pieces of information about an individual. 

> Current School Enrollment Status (enrolled vs. not-enrolled/leaver) 

> Most Recent Grade Attended (grade 9 through 1 2) 

These distinctions could alternatively have been represented as independent second and third table 
dimensions. To minimize the complexity of the table, however, we have chosen to construct status 
categories that are themselves combinations of two sub-dimensions. It should also be noted that the 
population of interest here consists of individuals who have been enrolled in the system at some point 
during period of observation (i.e. , never-enrolled individuals are outside the scope of observation). 

With respect to one’s current status, an individual may be either enrolled in school or no longer in school. 
It should be noted that being able to identify a no-longer-enrolled individual as a leaver implies having 
access to some information regarding that person’s past status. That is, leaver status at a particular point 
time (e.g., y+1) is effectively defined on the basis of having experienced an event between two 
observation points (e.g., y and y+1). As before, the second dimension in this contingency table (i.e., 
columns) remains a temporally defined one — the school year during which the observation is made. 

This example also introduces a shift in the notation convention. Here an abbreviation for an individual’s 
status (e.g., E for enrolled or L for leaver) replaces N as a signifier for the cell count. The row subscript 
remains the grade level. So, now E 9:Y would be used to represent the count of students enrolled (£) in 
grade 9 for year y (rather than N 9:y used earlier). The notation L y+ , would represent the number of 
formerly enrolled students now counted as leavers (i.e., individual enrolled at time y but not enrolled at 
time y+1). 

This table could be further expanded to represent specific kinds of school leavers of interest, which might 
include Graduates (diploma recipients), Alternative Completers (completers with non-diploma 
credentials), Dropouts (school leavers without credentials), and Outflow. The latter category would refer to 
students who leave the system without completing or dropping out (e.g., due to transfer or death). Given 
a sufficiently sophisticated data collection and information system, each of these leaver categories could 
also be distinguished based on the last grade attended, much in the way we distinguish currently enrolled 
students by their grade level. In such a data system, for example, D 9:Y+1 would represent students who 
dropped out (D) of the ninth (9) grade between y and y+1. That is, these individuals were counted as 
ninth graders at time y and dropouts at time y+1. Capturing this level of detail would generate a very large 
table with 20 rows, one for each grade-by-status combination. 
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Exhibit 5: High School Enrollment and Leaving 



School Year 

y = y to y+3 



Status 

Enrolled ( 9 -12) 

Not Enrolled ( Leavers ) 





y 


y+i 


y+2 


y+3 


y+4 


Grade 12 








E 12, y+3 




Grade 11 






E ll,y+2 






Grade 10 




E 10,y+1 








Grade 9 


E 9,y 










Leaver 




Ly+1 


Ly+2 


Ly+3 


Ey+4 



Many real-world data collection systems, however, do not support this level of detailed reporting. For 
instance, a school district may record annual information about grade-by-grade enrollment and dropout 
counts; total counts of diploma recipients and alternative completers (i.e., not disaggregated by grade); 
and no information at all about other kinds of Outflow from the system. This particular example essentially 
describes the district-level data on high school enrollment, completion, and dropout currently collected 
through the U.S. Department of Education’s Common Core of Data (CCD) data system. 4 Exhibit 6 
represents this information in contingency table form. 

The notation in this table is similar to the previous exhibit. For instance, G y+4 would represent the total 
number of diploma recipients (G) observed at time y+4. More specifically, these are the diplomas 
awarded between the y+3 to y+4 observations (i.e., during the school year that ended in the spring of 
year y+4). As discussed above, this particular information system is able to capture enrollment and 
dropout counts for specific grade levels. The notation associated with these statuses contains subscripts 
reflecting both grade level and year (e.g., D gy ). However, since we only observe the total numbers of 
graduates and alternative completers, we omit the grade-specific subscript for these outcomes (e.g., 
G y+4 ). In similar situations, alternative notational styles might use in a subscript to represent 



4 The Common Core of Data (CCD), conducted by the U.S. Department of Education, is an annual census survey of 
public sector local educational agencies (districts) and schools for the fifty states, the District of Columbia and several 
other non-state jurisdictions. Annual surveys of basic demographic and educational information at the state, district, 
and school levels are completed by staff of the respective state education agencies. 
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disaggregated categories that are not observed or marginal frequency that sum counts across a set of 
observed categories. Although we omit this subscript in the interests of simplifying our notation, it should 
be noted that the same information (e.g., the total number of high school dropouts for a particular year) 
could be expressed in a variety of equivalent ways (e.g., D y = [D g _ y + D 10 , y + D 11:Y + D 12 , y ] = D y ). 



Exhibit 6: Enrollment, Dropout, and Completion Table 
(modeled after the Common Core of Data) 



School Year 

y = y to y+3 



y y +1 y+2 y+3 y+4 





Grade 12 




Grade 11 




Grade 10 




Grade 9 


Status 


Dropout 12 


Enrolled ( 9 -12) 
Dropout (9-12) 
Graduate ( diploma ) 
Alternative Completer 


Dropout 1 1 


( other credential ) 


Dropout 10 




Dropout 9 




Graduates 




Alternative 

Completers 









E 12,y+3 








Ell, y+2 








E 10,y+1 








E 9,y 


















D 12,y+4 








Dll, y+3 








Dl0,y+2 








D 9,y+1 
















Gy+4 










Ay+4 
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3.2.3. Adding a Final Dimension — Group Membership 

So far, the contingency table examples discussed above have all been two-dimensional, capturing (1) an 
individual’s status and (2) time of observation. Data from such an information system could be used to 
construct GCD indicators in the aggregate — that is, to calculate GCD rates for the population as a whole. 
However, it is often desirable from an analytic perspective (and sometimes mandated by law) that these 
rates be disaggregated for specific subgroups of individuals. Doing so would require that every 
constituent data element used in a GCD indicator be broken down separately for each subgroup of 
interest. 

We can represent the type of data system necessary to construct disaggregated GCD indicators by 
expanding our contingency tables through the addition of a third dimension capturing group membership. 
Individuals may be identified as members of groups based on any number of characteristics, ranging from 
demographics (gender, race/ethnicity), to socioeconomic status (parental education level, family income), 
to educational classification (special education, English language learner), to academic performance 
(having received a passing or failing score on a state assessment). Additional higher-order dimensions 
could also be used to represent simultaneous membership in multiple groups, such as specific race-by- 
gender categories (e.g., Latino females, Asian males). 

Even a third dimension, however, can be difficult to depict visually. In the interest of clarity we will 
consider only a single group membership distinction, adding a third dimension to our contingency table 
(but not a fourth). Further, a simple example will be chosen in which group membership can be expressed 
using only two categories. Specifically, we will consider membership in gender-defined categories 
(female, male). Perhaps the easiest way to conceptualize this three-dimensional table is as a pair of 
layers. As shown in Exhibit 7, each of these layers can be represented as a two-dimensional table — one 
for females, one for males. 

Within each gender-specific table, rows represent grade-level enrollment status and columns represent 
school years. These individual layers could be “stacked” on top of one another to form a three- 
dimensional data matrix. Using the notation style introduced earlier, the count of individuals represented 
in each three-dimensional space defined on the basis of grade (g), year (y), and group membership (m) 
would be expressed as E gym 5 

3.2.4. A Special Type of Group — The Student Cohort 

Educational research often focuses considerable attention on demographically defined groups of students 
toward whom important policies or funding mechanisms are targeted. This might include students living in 
poverty or members of historically disadvantaged racial-ethnic groups. In the study of high school 
enrollment and completion, another specific type of group is often of particular interest — student “cohorts.” 



5 To maintain consistency with the abbreviation conventions used in the GCD indicator entries presented later in this 
report, the letter “m” will be used in subscripts to index group membership. It is coincidental that one of the gender 
groups in the present example (males) is also abbreviated using the same letter. 
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Exhibit 7: A Three-Dimensional Table 
Enrollment by Year by Gender 



Group = Female 


y 


y+1 


y+2 


y+3 


Grade 12 








El2,y+3,f 


Grade 1 1 






E ll,y+2,f 




Grade 10 




E 10,y+l,f 






Grade 9 


E 9,y,f 








Group - Male 


y 


y+1 


y+2 


y+3 


Grade 12 








El2,y+3,m 


Grade 1 1 






Ell,y+2,m 




Grade 10 




El0,y+l,m 






Grade 9 


E 9,y,m 









\ Female 



/ Male 




A 

group 
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Taken in the most generic sense, a cohort is simply a set of individuals who share a common 
characteristic, trait, or feature. When used in the context of demography or educational research, 
however, the term generally takes on a more specific meaning. Typically a cohort will refer to a group of 
individuals whose shared characteristic involves having experienced a common event during the same 
period of time. For example, demographic research often examines birth (or age) cohorts, which consist 
of individuals born during the same year. Of course, the same cohort of individuals might be identified in 
somewhat different ways over a period of time as it collectively experiences a series of temporally linked 
events. Members of the 1990 birth cohort, for instance, would become teenagers (i.e. , 13-year-olds) in 
the year 2003. 

The field of educational research, on the other hand, tends to be more concerned with cohorts defined on 
the basis of shared educational events. A school cohort such as a “ninth grade class” might consist of 
students who started grade nine during a particular school year. Often the study of high school outcomes 
is conceptualized in terms of anticipated events such as the expected year of high school completion. The 
latter, of course, is associated with the year in which students first start high school. Assuming on-time 
grade progression, students who first enroll in the ninth grade during the fall of 2000 would be members 
of the graduating high school class of 2004. 

The notion of a student cohort is a relative straightforward distinction in conceptual terms. Capturing true 
student cohorts empirically when calculating GCD rates, however, can be a much more challenging 
undertaking in practical terms. Many data systems, for instance, do not collect the precise information 
needed to ascertain an individual’s cohort membership. One database might report the total number of 
ninth graders in a particular year while another disaggregates enrollment counts separately by first-time 
and repeat ninth graders. Only in the latter case could an entering high school cohort be identified. 

The illustrations used in this section could generally be said to describe “closed” systems in which the 
total number of individuals being enumerated does not change over time. Real-world data systems, 
however, are “open” in the sense that individuals may move into or out of the scope of observation (e.g., 
students transferring to or from a school district). An open system introduces the methodological 
challenge of being able to identify new students who enter a cohort in progress (or leave the cohort). But 
assuming that this empirical difficulty could be overcome, there may also be related conceptual 
complications to consider. For instance, we might ask whether it is appropriate to treat new students who 
transfer into a system as “true” members of the cohort. The answer to that particular question may 
depend on the purposes for which information about those students is used. 
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4. What’s in a Rate? 

4.1 . A rate by any other name ... 

The previous sections have attempted to lay out the conceptual groundwork underpinning the 
development of the GCD catalog complied in this study. First, we considered the nature of student paths 
through secondary education and the ways in which high school graduation, completion, and dropout are 
defined. Then, we discussed key properties of the information systems that can be used to compile data 
about high school outcomes. In particular, the contingency table perspective presented above offered a 
potentially useful way to conceptualize aggregate-level data systems. But it also helped to introduce the 
notation conventions used in the mathematical formulas that will appear later in the catalog entries. This 
section turns to another central question in the study of GCD rates — exactly what we mean by a “rate.” 

In its most general or colloquial sense, a rate is one number divided by another and is typically expressed 
numerically as a small decimal value. Much as was the case with the term “cohort” discussed above, 
researchers often employ the term “rate” with a more particular, technically specific definition in mind. In 
the more technical sense used in educational research or social science, a rate generally expresses the 
percentage of an eligible population that experiences a particular event during a particular interval of time. 
A rate can be depicted using a very basic formula as follows. 



Rate 



Population subset experiencing an event _ £ 

Total population at risk of experiencing an event - P 



Several properties of a rate, defined in these terms, are worth noting. The general logic of a rate implies a 
temporal dimension — events occurring over some period of time. This temporal dimension, however, may 
be more or less well defined. The focal event may reference a specific interval of time — a particular year, 
month, or even day. But an event of interest could also refer to something that occurred at some 
unspecified time in the past or during a broadly defined period, rather than during a narrowly 
circumscribed time interval. 

The following two descriptions are examples of different types of rates, dropout rates in particular: 

> The percentage of students starting the ninth grade during a given school year who drop out 
before the end of that academic year 

> The percentage of young adults age 16-24 who have not completed at least a high school 
education 

The former type of indicator possesses a clearly defined time frame — a single academic year. A measure 
of this kind is often referred to as an event rate. The latter example, often called a status rate, 
possesses a less clearly defined temporal dimension. The status rate’s lack of specificity can be see with 
respect to both the population of interest (e.g., the broad range of ages considered young adulthood) and 
the timing of the event of interest (e.g., no reference to the year or grade level during which an individual 
left school). These distinctions can be expressed in somewhat different terms. Event indicators essentially 



- 18 - 



The Urban Institute / Education Policy Center 



GCD Indicator Catalog 



capture incidence — the rate at which an event occurs over a particular period of time. Status indicators, 
on the other hand, essentially measure prevalence — the proportion of a population having a particular 
(event-defined) characteristic at a particular point in time. 

The basic logic of a rate would also suggest that individuals counted in the numerator of the equation 
(i.e., those experiencing the event) must also be counted in the denominator (i.e., the risk set or 
population of individuals who could potentially experience the event). This characteristic of a rate is often 
assumed in technical discussions where rigor and precision are chief concerns. However, the term “rate” 
is often used in practice to refer to measures or indicators where it is not possible to empirically reconcile 
observed events with the individuals at risk. In fact, some of the official federal and state statistics 
described the GCD catalog share this property. 

Such a situation may arise as a result of the mathematical properties of a statistic or the practical 
limitations of an information system. For example, a researcher could try to approximate a graduation rate 
for a particular school district that has a very rudimentary data system. A basic indicator might be 
constructed by dividing the number of diploma recipients in the spring of 2004 by the number of ninth 
graders in the fall of 2000 (Formula 1 below). Due to a variety of factors not captured by the available 
data — grade retention, transfer into the system, migration out — the students in the numerator may not be 
a subset of the specific individuals in the denominator. That is, some of those 2004 graduates may have 
started high school before 2000, while others may have been enrolled in a different school district at that 
point in time. Alternatively, a more sophisticated database might contain information on the numbers of 
first-time ninth graders in 2000 and on-time graduates in 2004. Assuming no transfer into or out of the 
district, this would allow a researcher to employ a mathematical formula in which individuals in the 
numerator also appear in the denominator (see Formula 2 below). Information in real-world data systems, 
of course, may at times be incomplete or recorded with error. In such a situation, the case-by-case 
correspondence between numerator and denominator that exists in principle may not strictly pertain in 
practice. 



Formula 1 



Formula 2 



All diplomas in spring 2004 
All ninth graders in fall 2000 



On-time diplomas in spring 2004 
First-time ninth graders in fall 2000 



Empirically imperfect measures such as these are commonly referred to as rates by educational 
practitioners, policymakers, the public, and many researchers. From the more technically oriented 
perspective mentioned above, however, it could be argued that a definitive reconciliation of cases in the 
numerator and denominator of a measure constitutes an essential characteristic of a “rate.” As such, it 
might be more prudent (or at least more precise) to describe such imperfect measures as rate indicators 
or rate estimators rather than “rates” per se. This report will generally refer to empirical GCD measures as 
rate indicators in order to acknowledge these considerations. 
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4.2. Types of Empirical Rate Indicators 

As the discussion above suggests, real-world data constraints may place practical limitations on empirical 
measures of high school graduation, completion, and dropout. While a true rate may generally be 
preferable, calculating a more imprecise rate indicator may be the only feasible option in many situations 
in light of data limitations. In general, it is useful to think of an empirical measure or indicator of any kind 
as constituting a union of mathematical formula and observed data. This section describes and defines 
the rate indicator classifications used to categorize the statistics appearing the GCD catalog. 

4.2. 1. Indicator Categories used by NCES 

The terminology commonly used by researchers to describe different types of graduation, completion, 
and dropout rates has been strongly influenced by a series of reports published by the U.S. Department 
of Education’s National Center for Education Statistics (NCES). These reports ( Dropout Rates in the 
United States) distinguish among status, event, and cohort rates. The definitions used by NCES to 
describe these three kinds of rates appear in Exhibit 8. Although these definitions refer specifically to 
dropout rates, similar ones could be applied to high school graduation and completion. 



Exhibit 8: NCES Classification of Dropout Rates 



Event rates describe the proportion of students in a given age range who 
leave school each year without completing a high school program. 

Status rates provide cumulative data on dropouts among all young adults 
within a specified age range. Status rates are higher than event rates 
because they include all dropouts in a given age range, regardless of 
when they last attended school. 

Cohort rates measure what happens to a group of students over a period 
of time. These rates are based on repeated measures of a cohort of 
students with shared experiences and reveal how many students starting 
in a specific grade drop out over time. 



Source: Dropout Rates in the United States: 2000 (NCES 2002) 



The classification of rate indicators developed for the present report will employ these terms (status, 
event, cohort) and will also introduce new categories. However, as described below, these terms will be 
defined in a somewhat different manner than has become customary. Several considerations led to this 
decision to depart, at least in a minor way, from the common usage of these terms. 

> The definitions employed by NCES imply a categorical distinction between event and cohort 
rates. It will be suggested here, however, that cohort rates can be more usefully viewed as a 
subcategory of event rates. As described below, we will distinguish among specific types of event 
rate indicators based on the types of data used to develop an empirical measure. 
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> The term “cohort” has also come to be used in rather narrow way in the NCES publications. 
Increasingly, cohort rates have become associated specifically with measures that are based on 
data obtained from individual students who are tracked over a period of time. This narrow 
connotation would exclude from consideration some statistics that are based on group-level data 
and could legitimately be viewed as “cohort” rate indicators. Earlier installments of the Dropout 
Rates in the United States series, in fact, cited examples of such group-level cohort rates. 

> These three broad categories are not sufficient to capture the considerable diversity of GCD 
indicators in use today by federal and state agencies. In fact, most of such real-world statistical 
indicators may be most appropriately classified in other categories, which often fall somewhere in 
between the status, event, and cohort distinctions. 



4.2.2. Indicator Classification Used in this Catalog 

The classification scheme developed for this study’s catalog consists of two main classes of indicators: 
Status Rates and Event Rates. These broad categories resemble the definitions used by NCES in a 
variety of respects. In the current report, however, we distinguish among several different subtypes of 
event rates. It should be noted that the indicator classifications described below are generic in the sense 
that they can be applied in the same manner to graduation, completion, and dropout rates. 

Status Rates 

Status Rates are used to capture the prevalence of a particular condition or characteristic within a 
population at a single point in time. A status indicator can be thought of as a snapshot rather than an on- 
going observation over a period of time. As such, a status rate does not possess a temporal dimension in 
the traditional sense and can be considered a synchronic indicator. 

As suggested earlier, however, it is sometimes the case that the condition of interest itself can be defined 
on the basis of a past event. Demographers, for instance, might calculate the marriage rate for a 
population — the percentage of individuals who are married at a particular point in time. In this case, 
current marital status implies having experienced or not experienced a particular event (i.e., getting 
married) at some point in the past. The status-defining event, however, occurred at an unspecified time in 
the past. This means that for a particular member of a population the duration between that event and the 
time of observation is not known. In other words, the amount of time spent “at risk” of experiencing the 
event varies within the population. 

There are many examples of status rates in educational research. Perhaps the most relevant to the 
present study would be educational attainment indicators. Such a measure typically reports the proportion 
of the adult population that had reached a particular level of education (e.g., attended school until a 
certain grade or earned a particular credential). Status dropout, completion, or graduation rates are 
examples of educational attainment measures. 

We can also describe a rate indicator (in part) in terms of the kinds of data it requires (see Exhibit 9). As 
noted earlier, a status rate indicator is calculated using data observed at a single point in time (i.e., single 
cross-sectional data). Thinking back to our earlier illustrations, this means that a status rate can be 
computed using information that would appear within one temporally defined column of a contingency 
table. In effect, this would constitute a one-dimensional table, where the cells represent the frequency of 
different statuses at a single point in time. 
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Event Rates 



Event Rates capture the incidence of an occurrence within a population of interest during a particular 
interval of time. An event rate is a diachronic indicator, possessing an explicit temporal dimension. The 
duration of the time period being observed, however, can vary. We noted above that statuses can 
sometimes be defined in terms of past events. In a similar vein, the events that are the subject of event 
rate indicators can often be conceptualized as a transition from one status or condition to another. An 
annual marriage rate, for instance, measures the percentage of people who moved from an unmarried 
state to a married state over the course of a year. 

Returning again to the logic of contingency tables, event rates generally require data that can be 
expressed as a true two-dimensional data matrix. At a minimum, this would be a two-way contingency 
table where the rows represent statuses of interest and the columns represent multiple points in time. An 
event rate indicator captures differences in data across rows and columns. For a particular population, 
this would reflect the number of individuals who changed status (unmarried to married, enrolled to 
dropout) between observations. 



Exhibit 9: Types of Data Required to Calculate Status and Event Rate Indicators 



Type of Data System 


Status 

Rate 


Event 

Rate 


Retrospective 


Synthetic 


Pseudocohort 


Cohort 


Panel 


Single Cross-Section, 
Contemporary 


V 












Single Cross-Section, 
Retrospective 


s 


V 










Repeated 

Cross-Section 


V 




V 








Individually Tracked 
Longitudinal System 














^ - Rate indicator can be calculated using specified type of data. 



Types of Event Rates 

Within the broader category of event rates, it is possible to identify a number of notable variations on the 
general approach. These subtypes all possess the fundamental conceptual logic of an event. But this 
logic is operationalized in somewhat different ways. Often the main distinction among specific kinds of 
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event rates pertains to the properties of available data — in particular, the manner in which the temporal 
dimension is captured. 

Of course, there are exceptions to every rule. And the rule implying that event rates should contain a 
temporal dimension is itself no exception. Specifically, there are certain situations where event rate 
indicators can be generated using cross-sectional data. 



Retrospective Event Rates 

Retrospective Event Rate indicators can be constructed from strictly cross-sectional observations where 
information is reported on the occurrence of past events. In other words, the data are collected at one 
point in time but the information that is captured refers to multiple time points. We might think of this 
information as representing a “virtual” column in the temporal dimension of a two-way contingency table. 
This would be virtual in the sense that the time point associated with the information is not the same as 
the time point at which the observation (i.e., data collection) is made. A survey, for example, might ask 
high school-age respondents whether they were enrolled in school a year ago and, if so, at what grade 
level. This information could be combined with responses to questions about the respondent’s current 
situation (such as enrollment and high school completion status at the time of the survey). By doing so, 
an analyst could indirectly reconstruct past events that the individual had experienced (e.g., dropping out 
of school during the past year). A retrospective approach, therefore, allows for the calculation of an event 
rate based on a cross-sectional data collection. It should also be added that a retrospective event rate is 
a “true” rate as opposed to an indirect rate “indicator” in the technical sense discussed earlier. That is, the 
individuals appearing in the numerator of the mathematical formula would be a subset of individuals in the 
denominator. 



Synthetic Event Rates 

Synthetic Event Rate indicators employ observations on multiple temporally defined groups at a single 
point in time in order to approximate the experiences of a single hypothetical group over some period of 
time. In this case, the identification of these groups (at least indirectly) provides a temporal distinction, 
which is typically accomplished through the use of repeated observations in nonsynthetic rates. 
Calculating a standard (i.e., nonsynthetic) event graduation rate, on the one hand, might involve following 
a single class of ninth graders for four school years in order to determine how many had received a 
diploma by the end of that period. A synthetic version of this event rate, on the other hand, might compare 
the number of graduates to the number of ninth graders during a single school year. The validity of 
synthetic rates (compared to their nonsynthetic counterparts) typically relies on strong assumptions 
regarding the comparability of the groups of individuals from which data are obtained. Synthetic rate 
indicators do offer some advantages, which, given the right conditions, might at least partially 
compensate for their methodological weaknesses. In particular, a synthetic approach enables an analyst 
to approximate an event rate in situations where data have not been collected for a sufficient number of 
years to calculate a standard event rate. 

A synthetic rate can also be derived from a repeated cross-sectional database. This can happen when 
different pieces of information about a particular group of students are referenced to the same timeframe 
(e.g., a school year) but collected at different points in time. For example, an indicator for the dropout rate 
during year y might employ information about both enrollment counts and dropout counts. Enrollment 
counts for a given school year are usually taken in the fall of the school year. However, the collection of 
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dropout data is usually lagged — with counts of year y dropouts obtained in the fall of year y+1. 6 In 
general, calculating an indicator that combines information on enrollment with end-of-year statuses 
(graduation, completion, dropout) requires that the number of individual observation (or data collection) 
points will be one greater than the number of school years being observed. 

The final three types of event rates are calculated based on data containing multiple observations 
collected over time. These event rate indicators also all involve the logic of what might be called cohort 
analysis. As discussed earlier, a cohort is a group of individuals that experiences a common event during 
the same period of time. Cohort analysis, in its most basic sense, involves charting the characteristics of 
such a group over a period of time. This general logic of following a well-defined group of individuals 
across time is shared by all cohort-type event rates. However, this logic can be operationalized in 
different (more or less perfect) ways depending on the data available. In some respects, we might think of 
different types of cohort event rate indicators as being distinguished by their minimal data or informational 
requirements (see Exhibit 9). 



Cohort Event Rates 

Cohort Event Rate indicators calculate the proportion of cohort members who experience a particular 
event during a specified period of time. As noted above, a cohort itself is defined here as a group of 
individuals who share another earlier event in common. Like other event rates, we can conceptualize this 
indicator, in perhaps overly simple terms, as a fraction. The denominator (or risk set) for the indicator 
consists of the total number of group members (observed at time t) who could potentially experience the 
event of interest. The numerator (or incidence set) consists of the number of these group members who 
experienced the focal event between the initial observation of the group (f) and some later point in time 
(t+n). Repeated observations are, therefore, required in order to calculate a cohort rate. One example of a 
cohort event rate would be a four-year graduation rate for an entering high school class. This indicator 
could compare the number of first-time ninth graders (i.e., starting cohort members) to the number of 
those students who received a high school diploma within four years (the expected time of high school 
completion). A true cohort rate implies that both of these data elements (e.g., ninth grade enrollment and 
diploma counts) refer specifically to members of the focal cohort. This informational requirement can be 
met by a repeated cross-sectional database, provided the data system can disaggregate indicator 
elements by cohort membership. Individual cohort members need not be tracked prospectively over time 
in order to generate a cohort event rate. 



Panel Event Rates 

Panel Event Rate indicators are a specific kind of event rate measure that employ data generated from 
longitudinal data systems in which individual cohort members are tracked over time. In fact, we might 
think of this indicator in an even more narrow sense as a subtype of cohort event rates. For a standard 
cohort rate indicator, the focal group as a whole is followed over time. Panel event rates, however, track 
the individual members of this cohort. In some respects the differences between cohort and panel rate 
indicators can be quite minimal. At an aggregate level, for example, a panel rate may be indistinguishable 
from a cohort rate since both indicators would refer to the collective experience of the same group of 
individuals (i.e., a particular cohort). In the case of a panel rate, however, the temporal linkage in the data 



6 In many cases, when calculating an event indicator that combines information on enrollment with end-of-year 
statuses (graduation, completion, dropout) the number of required observation (data collection points) will be one 
greater than the number of school years being observed. This will be a result of the lag in collection of end-of-year 
status for the final school year observed. 
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exists at the individual (as opposed to group) level. Consequently, panel rate data can be disaggregated 
to directly examine the event of interest for specific persons. A database consisting of longitudinally 
tracked records allows for the observation of individual change and the possibility of conducting more 
sophisticated statistical analyses that link together data elements from multiple sources. As a result, 
tracked data and panel rates are often viewed as a “gold standard” for the investigation of graduation, 
completion, dropout, and other outcomes. 

Pseudocohort Event Rates 

Pseudocohort Event Rate indicators are cohort-like empirical measures that attempt to estimate the 
incidence of an event among a cohort of interest. Compared to true cohort rates, the major limitation of a 
pseudocohort indicator is that it utilizes repeated cross-sectional data in which cohort membership for 
individuals in a population cannot be directly or precisely observed. Analysts often employ these types of 
indicators in situations where data collection systems do not contain detailed information on the timing of 
past experiences. Suppose, for instance, that the cohort of interest is an expected graduating class. This 
might be the group of students who start ninth grade in the fall of 2000 (and who are in turn expected to 
graduate in the spring of 2004). To calculate a true cohort graduation rate, we would need data on ninth 
grade enrollment in 2000 and diploma status in 2004, broken down specifically for members of this group 
(i.e., first-time ninth graders in 2000). Suppose instead that the data actually available told us the total 
number of 2000 ninth graders and number of 2004 diplomas but not how many of these students started 
high school as ninth graders in 2000. In other words, we do not know who was and was not a member of 
the focal cohort. In this type of situation, a pseudocohort rate indicator would attempt to approximate the 
experience of the cohort as closely as possible using the available information (2004 diplomas divided by 
2000 ninth graders). As this example suggests, a pseudocohort indicator possesses the general logic of a 
cohort rate. It reflects the temporal milestones of the actual cohort of interest, and data are observed at 
the time points when true cohort members are expected to experience these key events (i.e., starting and 
finishing high school). Due to practical data limitations, however, cohort identification cannot be fully 
operationalized. 
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5. Major Sources of Public Information on GCD Rates 
5.1. Federal Information 

Since the late 1980s, the most influential source of information about graduation, completion and dropout 
rates has been the series of Dropout Rates in the United States reports published by the U.S. Department 
of Education. These reports were initially produced in response to a congressional mandate to annually 
document state of high school dropout in the nation. Specifically, in 1988 the new authorizing legislation 
for the National Center for Education Statistics contained three mandates for the center. 7 

> Conduct an annual survey of dropout and retention rates 

> Report a dropout rate for a 1 2-month period to Congress every year 

> Establish a Special Task Force on Dropout and Retention Rates to develop and test an effective 
methodology for measuring dropout and retention rates 

The Department of Education has continued to publish these reports even after the authorizing legislation 
expired. Annual reports are available for the years 1988 through 2000, with further installments 
anticipated in the future. 

These reports compile the most current information available from several major data sources in order to 
provide up-to-date statistics on high school dropout, completion, and graduation rates for the nation as a 
whole. Analyses present findings for particular demographic subgroups, the states, and other 
geographical areas where data permit. In some years, more extensive supplemental analyses also focus 
on particular populations (e.g., immigrants or individuals with disabilities) or special topics (e.g., grade 
retention patterns, reasons for dropout, or the economic returns to high school completion). 

These reports are widely cited in governmental, policy, media, and research circles and have helped to 
promote the popular distinction among status, event, and cohort rates described earlier. The exact GCD 
statistics being reported have evolved slightly over time (e.g., the age range to which a particular rate 
refers). The specific slate of analyses presented also differs somewhat from year to year. Cohort rates 
derived from long-term longitudinal studies, for example, are only available on a periodic basis. A 
summary of GCD rates appearing in these reports by year can be found in Exhibit 10. This table uses the 
NCES rate categories by which the statistics are identified in the original publications (rather than the new 
classification developed for the current study). 

Three major data sources appearing in the Dropout Rates in the Unites States reports are the Current 
Population Survey, Common Core of Data, and longitudinal studies conducted by the U.S. Department of 
Education. The latter category would include the High School and Beyond Study (HSB) initiated in the 
early 1980s and the National Education Longitudinal Study of 1988 (NELS) fielded a decade later. These 
databases have been the leading source of publicly accessible information on GCD rates for several 
decades. While an understanding of the methodological intricacies of these sources would lend an 
appreciation of past and future approaches to developing GCD indicators, a systematic treatment of this 
topic is well beyond the scope of the present investigation. Fortunately, these data sources and the 
details of their methodological designs are well documented and readily accessible. 8 



7 Additional information can be found in the first report in this series, Dropout Rates in the United States: 1988 
(National Center for Education Statistics, 1989). 

8 Overviews of these respective databases and citations for further reading can be found in the Department of 
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Exhibit 10: Sources for GCD Rates Reported in Dropout Rates in the 
United States Reports (U.S. Department of Education) 



Dropout Rates Completion Rates Graduation Rates Retention/ 

Persistence 

Rates 1 

Year Event Status Cohort Event Status Cohort Event Status Cohort 



1988 


CPS 


CPS 


HSB 


CPS 


(HSB) 2 




Various 

Sources 




CPS 


1989 


CPS 


CPS 




CPS 






CPS 




CPS 


1990 


CPS 


CPS 


NELS 


CPS 






CPS 




CPS 

NELS 


1991 


CPS 


CPS 

Census 


CPS 

NELS 


CPS 


(HSB) 


(SASS) 


CPS 


(HSB) 


CPS 

NELS 








CPS 














1992 


CPS 


CPS 


NELS 

(HSB) 


CPS 


(HSB) 




CPS 


(HSB) 


CPS 








CPS 














1993 


CPS 


CPS 


NELS 

(HSB) 


CPS 




(SASS) 


CPS 




CPS 


1994 


CPS 

CCD 3 


CPS 


NELS 

(HSB) 


CPS 4 






CPS 


NELS 


CPS 


1995 


CPS 

CCD 


CPS 


NELS 

(HSB) 


CPS 






CPS 




CPS 


1996 


CPS 

CCD 


CPS 


NELS 

(HSB) 


CPS 






CPS 




CPS 


1997 


CPS 

CCD 


CPS 




CPS 






CPS 






1998 


CPS 

CCD 


CPS 




CPS 






CPS 






1999 


CPS 

CCD 


CPS 




CPS 






CPS 






2000 


CPS 

CCD 


CPS 




CPS 













Source: Dropout Rates in the United States reports 1 988-200 (National Center for Education Statistics) 

Abbreviations: Census = U.S. Decennial Census; CPS = Current Population Survey; CCD = Common Core of Data; HSB = High School and 
Beyond; NELS=National Education Longitudinal Study; SASS = Schools and Staffing Study. 

Notes 

1 . School Retention/Persistence refers to the percent of student who remain enrolled in school (as opposed to dropping out). This statistic is 
calculated by subtracting the event dropout rate from one. 

2. Sources in parentheses indicate secondary data sources that are only briefly cited or used for purposes of comparison with more current 
data. 

3. Starting in 1994, the Common Core of Data was used to estimate state-level event dropout rates. The number of states for which the 

necessary data are available varies from year to year (ranging from 1 7 in 1 994 to 37 in 2000). 

4. Starting in 1994, the Current Population Study was used to generate 3-year moving averages of the status completion rate at the state 

level. 



Education’s Dropout Rates in the United States publications. The catalog entries for GCD indicators presented later 
in the current report also include citations that provide more in-depth methodological discussions, particularly for the 
Current Population Survey and the Common Core of Data. 
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5.2. State Information 

In addition to these federal data sources, student information systems maintained by state education 
agencies have become an increasingly prominent source of information about high school outcomes. This 
is particularly true since the passage of the No Child Left Behind Act, which required that all states 
calculate and publicly report high school graduation rates as part of their accountability systems. Data 
generated by the states themselves will be the basis for the most locally visible and consequential high 
school completion statistics, since these data will be used to determine whether or not a school makes 
adequate yearly progress (AYP) under the law. Schools and districts repeatedly failing to meet 
performance standards will be identified as in need of improvement and subject to a series of 
progressively aggressive corrective actions. 

The evolutions of individual state student information systems have occurred largely independently over 
the past several decades and have been greatly influenced by a variety of local factors. As a result, 
features of these data systems vary considerable across the states with respect to their structure, 
comprehensiveness, and technical sophistication. At one extreme, some state information systems 
consist of relatively basic aggregate-level data collected as cross-sectional snapshots on an annual basis. 
At the opposite end of the spectrum, other states have developed statewide student tracking systems that 
follow individual students over time through the use of a unique student identification code. This student 
identifier also provides a potential mechanism for constructing an even more extensive database of 
student information compiled from a wide variety of administrative sources (provided that these individual 
databases employ the same student identification code). 

Often these administrative record systems will generate information that is not directly comparable to data 
from other states. Among other reasons, this may result from differences in the way the data are 
collected. Some states collect only cross-sectional aggregate data while others maintain records of 
individually tracked students. The record keeping systems and administrative procedures of individual 
states can also differ considerably. Noncomparability of high school outcome statistics, however, may 
also be a result of differences in the organization of the state educational systems themselves. For 
instance, some states only offer a single high school completion credential while others offer multiple 
completion options. The requirements that students must meet to receive a given credential also vary 
across the states (e.g., students must pass a high school exit exam to receive a diploma in some states 
but not others). 

The types of data at hand exert a strong practical influence on the ways in which states measure 
graduation, completion, and dropout rates. As noted earlier, certain kinds of empirical indicators require 
particular types of data. A comprehensive review of state information systems is beyond the scope of the 
current investigation. The reader should keep in mind, however, that the state indicators detailed in the 
catalog have been shaped in part by the limitations of their respective data systems. Many states appear 
to be moving in the direction of developing more comprehensive student information systems, particularly 
those capable of tracking individual students. This raises the possibility that states will revise and refine 
their GCD indicators in the future as these new data systems become operational. 
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6. Overview of the GCD Catalog 

6.1. The Review and Analytic Strategy 

The particular set of GCD indicators described in the catalog entries in Section 7 of this report were 
identified through an extensive review of publicly accessible policy documents, statistical reports, 
methodological documentation, and databases. Although not exhaustive, this review does capture the 
considerable range of approaches to calculating GCD rates currently in use by federal and state 
agencies. Our initial analysis of these indicators served as the basis for developing and refining the 
classification scheme for GCD rate described earlier. This framework was, in turn, used to design the 
format for the catalog entries and categorize the individual indicators examined. 

6. 1. 1. Federal GCD Indicators 

As one would expect, the U.S. Department of Education is the federal agency most involved in producing 
information about high school completion and dropout. Within the Department, the National Center for 
Education Statistics takes a leading role in collecting data and publicly disseminating information about 
high school dropout, completion, and graduation. In some cases, these statistics are generated from 
studies supported jointly by the Department of Education and other federal agencies. The Current 
Population Survey, for instance, is a large monthly survey administered by the U.S. Census Bureau 
(Department of Commerce) on behalf of the Bureau of Labor Statistics (Department of Labor), with the 
Department of Education providing support for a school enrollment supplement included as part of the 
October CPS installment. Often, the information on high school outcomes that other federal agencies 
incorporate into their own publications is obtained directly or indirectly from Department of Education 
sources (e.g., the Dropout Rates in the United States reports). In some cases, however, other federal 
agencies do maintain independent data collection and reporting efforts that generate original public 
information about high school outcomes and educational attainment levels more generally. A number of 
these other federal indicators are described in the catalog. 

6. 1.2. State Graduation Rate Indicators 

Conducting a comprehensive review of all GCD statistical indicators in current use by the 50 states and 
the District of Columbia would present a daunting task. In order to provide a systematic yet meaningful 
treatment in the context of the current study, it was necessary to focus our state-level review on the one 
high school outcome for which information about state statistical indicators is readily and systematically 
available — high school graduation rates. The Title I accountability provisions of the No Child Left Behind 
Act (NCLB) required every state to calculate and publicly report high school graduation rates as part of 
their respective statewide accountability systems. 9 Each state described their NCLB accountability plans 
in a document known as the Consolidated State Application Accountability Workbook, part of which 
presented the state’s method for computing the high school graduation rate. Federal regulations for Title I 
afforded the states a significant degree of flexibility in selecting graduation rate indicators that could be 
calculated using the kinds of data available in their respective information systems. 



9 The No Child Behind Act of 2001 was signed into law as Public Law 107-110 on January 8, 2002. The 
accountability provisions that are of concern in this paper are contained in Title I Part A of the act. More specifically, 
the graduation rate definition is located at 20 U.S.C. 631 1((b)(2)(C)(vi); 115 STAT.1447. 
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